HIVE-28952: TableFetcher to return Table objects instead of names #6020

Neer393 · 2025-08-08T10:51:31Z

What changes were proposed in this pull request?

Modified TableFetcher to return Table Objects instead of name. In this way we reduced the number of msc calls.

Why are the changes needed?

It's an improvement to reduce the number of msc calls

Does this PR introduce any user-facing change?

No it just adds a new method to get tables as object

How was this patch tested?

Locally by executing the unit tests

...tore/metastore-client/src/main/java/org/apache/hadoop/hive/metastore/utils/TableFetcher.java

Neer393 · 2025-08-11T06:36:17Z

Hi @vikramahuja1001 @deniskuzZ @abstractdog can we have this reviewed ?

...tore/metastore-client/src/main/java/org/apache/hadoop/hive/metastore/utils/TableFetcher.java

...ndler/src/main/java/org/apache/iceberg/mr/hive/metastore/task/IcebergHouseKeeperService.java

...tore/metastore-client/src/main/java/org/apache/hadoop/hive/metastore/utils/TableFetcher.java

sonarqubecloud · 2025-08-12T15:57:55Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Neer393 · 2025-08-13T11:44:05Z

@deniskuzZ @okumin all comments have been addressed and all checks have passed.
We are good to merge this 👍

vikramahuja1001 · 2025-08-14T04:40:35Z

+1 non binding

okumin

+1

...ndler/src/main/java/org/apache/iceberg/mr/hive/metastore/task/IcebergHouseKeeperService.java

okumin · 2025-08-15T06:23:28Z

Merged. @Neer393 Thanks for your contribution! @vikramahuja1001 @deniskuzZ Thanks for your review!

deniskuzZ · 2025-08-15T09:04:26Z

...tore/metastore-client/src/main/java/org/apache/hadoop/hive/metastore/utils/TableFetcher.java

+    List<String> databases = client.getDatabases(catalogName, dbPattern);
+
+    for (String db : databases) {
+      List<String> tablesNames = getTableNamesForDatabase(catalogName, db);


@Neer393 I don't understand what have you optimized here.
You are still doing multiple calls: 1 to get table names and another to get table objects. Why not get table objects directly?

Also, have you considered the memory impact when loading everything into the heap? You could have iterated over TableIterable instead. I don't think that is a robust solution, it can potentially lead to OOM.
cc @dengzhhu653, @wecharyu

The earlier implementation had one msc call for getting table names and then one msc call each for getting the HMS table object for each table name.

The newer implementation reduces the msc calls in a way that one msc call is made for getting all table names and then using TableIterable, the number of msc calls for getting table objects becomes Number of tables / (BATCH_MAX_RETRIEVE config value [Default is 300])

So in the older implementation number of msc calls = 1 + number of tables
whereas in the newer implementation number of msc calls = 1 + (number of tables / [BATCH_MAX_RETRIEVE])

In my earlier implementation I had the same proposal of directly getting table objects where I had implemented direct HMS API endpoint like listTableNamesByFilter but the idea was dropped by @vikramahuja1001

in order to use batching, you need to have the table list to fetch - that's ok. However, instead of working with the batches, you load everything into memory.
Could you refactor to use Iterable (i.e make getTables return Iterable<Table>)?

Okay so for this fix should I create a new JIRA or as I am working on https://issues.apache.org/jira/browse/HIVE-28974 which is related to IcebergHouseKeeperService only should I attach the fix in this JIRA ?
Whatever you suggest is fine to me

+1 Use the TableIterator is more reasonable to avoid possible OOM.

Let me summarize the points.

This PR would reduce # of API calls from O(num-tables) to O(num-tables / batch size). It's neat 👍

This PR would require the O(num-tables * tbl-size) space. We'd like to reduce it

Regardless of this PR, we retain the O(num-tables * length-of-table-name) space. Is it OK or should we optimize it?

AFAIK, it's a tradeoff between the number of msc calls and the space.
As if we try to decrease the number of tables stored in memory, we would increase the number of msc calls and then there would be no point of this JIRA.
Please correct me if I am wrong but this is what I think

My concern was related to the fetch logic where we load all Hive table objects into the memory instead of using the batch iterator.

We also make O(num-tables) calls to load an Iceberg table. Can we optimize here? Then we put those into a separate cache. Maybe iwe could use CachingCatalog instead ?

tableCache.get(tableName, key -> IcebergTableUtil.getTable(conf, table))

okumin · 2025-08-16T07:45:35Z

...ndler/src/main/java/org/apache/iceberg/mr/hive/metastore/task/IcebergHouseKeeperService.java

-              catalogName, dbPattern, tablePattern, e);
-        }
+      for (org.apache.hadoop.hive.metastore.api.Table table : tables) {
+        expireSnapshotsForTable(getIcebergTable(table));


I recalled that we would like to retain the try-catch. We intentionally added it to avoid skipping everything when a single expiration fails.
See also: #5786 (comment)

okumin · 2025-08-16T07:46:54Z

@Neer393 @deniskuzZ I created a revert PR because we found two issues to be discussed.
#6033

Neer393 · 2025-08-16T08:01:21Z

Okay. In that case let me close the redundant JIRA that I created for fixing this.
We will now ship this once and for all under HIVE-28952 only

deniskuzZ · 2025-08-19T15:29:41Z

Okay so for this fix should I create a new JIRA or as I am working on https://issues.apache.org/jira/browse/HIVE-28974 which is related to IcebergHouseKeeperService only should I attach the fix in this JIRA ?

@Neer393 HIVE-28952 addendum should be OK.
1 ask: can we use TableIterator in IcebergTableOptimizer as well?

Neer393 · 2025-08-20T06:37:12Z

Okay so for this fix should I create a new JIRA or as I am working on https://issues.apache.org/jira/browse/HIVE-28974 which is related to IcebergHouseKeeperService only should I attach the fix in this JIRA ?

@Neer393 HIVE-28952 addendum should be OK. 1 ask: can we use TableIterator in IcebergTableOptimizer as well?

Okay in that case @okumin please do not revert the PR. As per @deniskuzZ I would put an addendum to it.

@deniskuzZ using TableIterator in IcebergTableOptimizer for getTableNames() is not a big deal. I mean we can add it but the question is do we want to as we added TableIterator as a single list of HMS API Table objects would be a burden as table objects are big in size but TableName objects are very small with only 3 strings for tablename, dbname and catalogname.

So I don't think we require TableIterator as such for TableOptimizer but if you say so I can add it. I say the call is your's.

deniskuzZ · 2025-08-20T09:22:58Z

@Neer393 IcebergTableOptimizer first retrieves the list of table names, then for each name it loads the corresponding Hive table object (O(num-tables) getHiveTable() call), followed by loading the Iceberg tables (O(num-tables)).
This is the same suboptimal implementation as in IcebergHouseKeeperService

Neer393 · 2025-08-20T11:19:14Z

@Neer393 IcebergTableOptimizer first retrieves the list of table names, then for each name it loads the corresponding Hive table object (O(num-tables) getHiveTable() call), followed by loading the Iceberg tables (O(num-tables)). This is the same suboptimal implementation as in IcebergHouseKeeperService

Oh okay. In that case yes.
Will make changes there as well

asf-ci-hive added the tests pending label Aug 8, 2025

vikramahuja1001 reviewed Aug 8, 2025

View reviewed changes

...tore/metastore-client/src/main/java/org/apache/hadoop/hive/metastore/utils/TableFetcher.java Outdated Show resolved Hide resolved

asf-ci-hive added tests failed and removed tests pending labels Aug 8, 2025

Neer393 force-pushed the HIVE-28952-new branch from 8d50f35 to d9bbf83 Compare August 8, 2025 16:17

asf-ci-hive added tests pending and removed tests failed labels Aug 8, 2025

Neer393 requested a review from vikramahuja1001 August 8, 2025 16:19

asf-ci-hive added tests unstable tests pending tests passed and removed tests pending tests unstable labels Aug 8, 2025

vikramahuja1001 reviewed Aug 11, 2025

View reviewed changes

...tore/metastore-client/src/main/java/org/apache/hadoop/hive/metastore/utils/TableFetcher.java Show resolved Hide resolved

okumin reviewed Aug 11, 2025

View reviewed changes

...ndler/src/main/java/org/apache/iceberg/mr/hive/metastore/task/IcebergHouseKeeperService.java Outdated Show resolved Hide resolved

deniskuzZ reviewed Aug 11, 2025

View reviewed changes

...tore/metastore-client/src/main/java/org/apache/hadoop/hive/metastore/utils/TableFetcher.java Show resolved Hide resolved

Neer393 requested a review from deniskuzZ August 11, 2025 10:54

Neer393 force-pushed the HIVE-28952-new branch from d9bbf83 to 185b976 Compare August 12, 2025 04:33

asf-ci-hive added tests pending and removed tests passed labels Aug 12, 2025

Neer393 requested review from okumin and vikramahuja1001 August 12, 2025 04:34

asf-ci-hive added tests passed and removed tests pending labels Aug 12, 2025

HIVE-28952: TableFetcher to return Table objects instead of names

d17ca25

Neer393 force-pushed the HIVE-28952-new branch from 185b976 to d17ca25 Compare August 12, 2025 14:25

asf-ci-hive added tests pending and removed tests passed labels Aug 12, 2025

asf-ci-hive added tests passed and removed tests pending labels Aug 12, 2025

okumin approved these changes Aug 14, 2025

View reviewed changes

...ndler/src/main/java/org/apache/iceberg/mr/hive/metastore/task/IcebergHouseKeeperService.java Show resolved Hide resolved

okumin merged commit ac3ea05 into apache:master Aug 15, 2025
6 checks passed

deniskuzZ reviewed Aug 15, 2025

View reviewed changes

okumin mentioned this pull request Aug 16, 2025

Revert "HIVE-28952: TableFetcher to return Table objects instead of names" #6033

Closed

okumin reviewed Aug 16, 2025

View reviewed changes

HIVE-28952: TableFetcher to return Table objects instead of names #6020

HIVE-28952: TableFetcher to return Table objects instead of names #6020

Uh oh!

Conversation

Neer393 commented Aug 8, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Uh oh!

Neer393 commented Aug 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud bot commented Aug 12, 2025

Quality Gate passed

Uh oh!

Neer393 commented Aug 13, 2025

Uh oh!

vikramahuja1001 commented Aug 14, 2025

Uh oh!

okumin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

okumin commented Aug 15, 2025

Uh oh!

deniskuzZ Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Neer393 Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Neer393 Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

deniskuzZ Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Neer393 Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

wecharyu Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

okumin Aug 16, 2025

Choose a reason for hiding this comment

Uh oh!

Neer393 Aug 16, 2025

Choose a reason for hiding this comment

Uh oh!

deniskuzZ Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

okumin Aug 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

okumin commented Aug 16, 2025

Uh oh!

Neer393 commented Aug 16, 2025

Uh oh!

deniskuzZ commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Neer393 commented Aug 20, 2025

Uh oh!

deniskuzZ commented Aug 20, 2025

Uh oh!

Neer393 commented Aug 20, 2025

Uh oh!

Reviewers

Assignees

deniskuzZ Aug 15, 2025 •

edited

Loading

Neer393 Aug 15, 2025 •

edited

Loading

deniskuzZ Aug 15, 2025 •

edited

Loading

deniskuzZ Aug 19, 2025 •

edited

Loading

okumin Aug 16, 2025 •

edited

Loading

deniskuzZ commented Aug 19, 2025 •

edited

Loading